Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | في | 26 | هو |
2 | من | 27 | بن |
3 | على | 28 | أنه |
4 | أن | 29 | كانت |
5 | إلى | 30 | وفي |
6 | التي | 31 | حيث |
7 | عن | 32 | حتى |
8 | ما | 33 | خلال |
9 | لا | 34 | قبل |
10 | الذي | 35 | ولا |
11 | مع | 36 | عام |
12 | أو | 37 | قد |
13 | هذا | 38 | وقد |
14 | هذه | 39 | أي |
15 | كان | 40 | هي |
16 | إن | 41 | أكثر |
17 | بعد | 42 | وهو |
18 | الله | 43 | غير |
19 | ذلك | 44 | وقال |
20 | لم | 45 | لكن |
21 | كل | 46 | هناك |
22 | بين | 47 | ومن |
23 | كما | 48 | عليه |
24 | قطر | 49 | له |
25 | ، | 50 | منذ |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges